

Note: This PowerPoint presentation contains a significant amount of animation to help illustrate the concepts described. SHARE proceedings are usually restricted to Adobe portable-document-format (.pdf) files. If you would like a copy of the original PowerPoint slide show, please see me after the session or send me an email at the address on the cover page.





This presentation reviews the new CPU facilities introduced (mostly) by the IBM zEnterprise 196 series of processors (the one exception is the message-security-assist extension 3 [MSA-X3] which was introduced in the System z10 GA3 machines, but was not previously published).

The major focus is on general instructions used by various high-level languages such as C and Java. The final slides will address a few other facilities available for authorized programs.

If you have a PowerPoint version of the presentation, this slide, and the section headings that they designate, contain hyperlinks to the various topics and subtopics. Each slide containing specific information has an "Index" hiperlink in the bottom-right corner that will return you to the next-higher level of information. (Note, SHARE limits their download page to PDFs; if you want the PowerPoint show, see me after the presentation, or send a note to dgreiner@us.ibm.com.)



Since its introduction in 1964, System 360 and all of its successors have provided 16 generalpurpose registers. To alleviate the constraint felt by many programmers, numerous architectural features have been added: The relative branching (short and long) facilities, immediate- and extended-immediate-operand facilities, and the long displacement facility are a few examples. However, the 16-register limit continues to prove daunting to both assembler programmers and compiler designers alike.

Although z/Architecture provides 64-bit addressing and arithmetic, many applications continue to operate in the 31-bit addressing mode, and rarely require higher-precision arithmetic than 32 bits. For such programs, the leftmost 32 bits of the 64-bit registers have been of little use ... until now.

The high-word facility provides a means by which selected new instructions can operate on the leftmost 32 bits (bits 0-31) of a general register – independent of the rightmost 32 bits (bits 32-63). This separation extends into address generation performed while in the 24- or 31-bit addressing modes; the updating of the leftmost 32 bits of a general-purpose register, using the high-word instructions, does not affect any pipeline address-generation interlock used by the rightmost 32 bits.

Several of the facilities discussed in this presentation share a common facility bit. Bit 45 indicates the installation of the high-word, interlocked-access, load/store-on-condition, distinct-operands, population-count, and fast-BCR-serialization facilities.

|        |                                                                                             |                                                                                                                       |                                                                                                                                                                                                                                                                                                                                                                                             | 3 <sup>rd</sup> Operand                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|--------|---------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| AHHHR  | B9C8                                                                                        | R <sub>1</sub> .0-31                                                                                                  | R <sub>2</sub> .0-31                                                                                                                                                                                                                                                                                                                                                                        | R <sub>3</sub> .0-31                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| AHHLR  | B9D8                                                                                        | R <sub>1</sub> .0-31                                                                                                  | R <sub>2</sub> .0-31                                                                                                                                                                                                                                                                                                                                                                        | R <sub>3</sub> .32-63                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| AIH    | CC8                                                                                         | R <sub>1</sub> .0-31                                                                                                  | I <sub>2</sub> [32 bits]                                                                                                                                                                                                                                                                                                                                                                    | -                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| ALHHHR | B9CA                                                                                        | R <sub>1</sub> .0-31                                                                                                  | R <sub>2</sub> .0-32                                                                                                                                                                                                                                                                                                                                                                        | R <sub>3</sub> .0-31                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| ALHHLR | B9DA                                                                                        | R <sub>1</sub> .0-31                                                                                                  | R <sub>2</sub> .0-32                                                                                                                                                                                                                                                                                                                                                                        | R <sub>3</sub> .32-63                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| ALSIH  | CCA                                                                                         | R <sub>1</sub> .0-31                                                                                                  | l <sub>2</sub> [32 bits]                                                                                                                                                                                                                                                                                                                                                                    | -                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| ALSIHN | ССВ                                                                                         | R <sub>1</sub> .0-31                                                                                                  | l <sub>2</sub> [32 bits]                                                                                                                                                                                                                                                                                                                                                                    | -                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| BRCTH  | CC6                                                                                         | R <sub>1</sub> .0-31                                                                                                  | RI <sub>2</sub> [16 bits]                                                                                                                                                                                                                                                                                                                                                                   | -                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| CHHR   | B9CD                                                                                        | R <sub>1</sub> .0-31                                                                                                  | R <sub>2</sub> .0-31                                                                                                                                                                                                                                                                                                                                                                        | -                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| CHLR   | B9DD                                                                                        | R <sub>1</sub> .0-31                                                                                                  | R <sub>2</sub> .32-63                                                                                                                                                                                                                                                                                                                                                                       | -                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| CHF    | E3CD                                                                                        | R <sub>1</sub> .0-31                                                                                                  | S20 [32 bits]                                                                                                                                                                                                                                                                                                                                                                               | -                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| CIH    | CCD                                                                                         | R <sub>1</sub> .0-31                                                                                                  | I <sub>2</sub> [32 bits]                                                                                                                                                                                                                                                                                                                                                                    | -                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
|        | AHHLR<br>AIH<br>ALHHHR<br>ALHHHR<br>ALSIH<br>ALSIH<br>ALSIH<br>BRCTH<br>CHHR<br>CHLR<br>CHE | AHHLR B9D8   AIH CC8   ALHHR B9CA   ALHHR B9DA   ALSIH CCA   ALSIH CCB   BRCTH CC6   CHHR B9CD   CHLR B9DD   CHE E3CD | AHHLR     B9D8     R, 0-31       AIH     CC8     R, 0-31       ALHHR     B9CA     R, 0-31       ALHHR     B9DA     R, 0-31       ALHHR     B9DA     R, 0-31       ALHHR     B9DA     R, 0-31       ALSIH     CCA     R, 0-31       ALSIH     CCB     R, 0-31       BRCTH     CC6     R, 0-31       CHHR     B9CD     R, 0-31       CHLR     B9DD     R, 0-31       CHE     E3CD     R, 0-31 | AHHLR     B9D8     R <sub>1</sub> .0-31     R <sub>2</sub> .0-31       AIH     CC8     R <sub>1</sub> .0-31     I <sub>2</sub> [32 bits]       ALHHR     B9CA     R <sub>1</sub> .0-31     R <sub>2</sub> .0-32       ALHHR     B9DA     R <sub>1</sub> .0-31     R <sub>2</sub> .0-32       ALHHR     B9DA     R <sub>1</sub> .0-31     R <sub>2</sub> .0-32       ALHHR     B9DA     R <sub>1</sub> .0-31     R <sub>2</sub> .0-32       ALSIH     CCA     R <sub>1</sub> .0-31     I <sub>2</sub> [32 bits]       ALSIH     CCB     R <sub>1</sub> .0-31     I <sub>2</sub> [32 bits]       BRCTH     CC6     R <sub>1</sub> .0-31     R <sub>12</sub> [16 bits]       CHHR     B9CD     R <sub>1</sub> .0-31     R <sub>2</sub> .0-31       CHLR     B9DD     R <sub>1</sub> .0-31     R <sub>2</sub> .32-63       CHE     E3CD     R <sub>1</sub> .0-31     S20 [32 bits] |

This slide enumerates the first 12 instructions in the high-word facility; the remainder are listed on the following slide. As will be immediately obvious, only a limited subset of the instructions are provided to manipulate the high words: ADD, ADD LOGICAL, BRANCH RELATIVE ON COUNT, COMPARE, COMPARE LOGICAL, LOAD BYTE, LOAD HALFWORD, LOAD, LOAD LOGICAL CHARACTER, LOAD LOGICAL HALFWORD, ROTATE THEN INSERT SELECTED BITS, STORE CHARACTER, STORE HALFWORD, STORE, SUBTRACT and SUBTRACT LOGICAL.

Note that many of the arithmetic-operand instructions have distinct operands; that is, the target register is separate from the two source registers.

Also note that, of necessity, certain characters in the mnemonics have become a bit overloaded. The rookie programmer will likely find using the high-word facility challenging. We hope the benefits will be worth it.

| High-V                                | Vord          | Faci   | <u>lity (3):</u>                                       |                          |                                                  |
|---------------------------------------|---------------|--------|--------------------------------------------------------|--------------------------|--------------------------------------------------|
| Instruction                           | Mnemonic      | OpCode | 1 <sup>st</sup> Operand                                | 2 <sup>nd</sup> Operand  | Other                                            |
| COMPARE LOGICAL HIGH                  | CLHHR         | B9CF   | R₁.0-31                                                | R <sub>2</sub> .0-31     | _                                                |
| COMPARE LOGICAL HIGH                  | CLHLR         | B9DF   | R <sub>1</sub> .0-31                                   | R <sub>2</sub> .32-63    | _                                                |
| COMPARE LOGICAL HIGH                  | CLHF          | E3CF   | R <sub>1</sub> .0-31                                   | S20 [32 bits]            | _                                                |
| COMPARE LOGICAL IMMEDIATE HIGH        | CLIH          | CCF    | R <sub>1</sub> .0-31                                   | l <sub>2</sub> [32 bits] | _                                                |
| LOAD BYTE HIGH                        | LBH           | E3C0   | R <sub>1</sub> .24-31                                  | S20 [8 BITS]             | _                                                |
| LOAD HALFWORD HIGH                    | LHH           | E3C4   | R <sub>1</sub> .16-31                                  | S20 [16 bits]            | _                                                |
| LOAD HIGH                             | LFH           | E3CA   | R <sub>1</sub> .0-31                                   | S20 [32 bits]            | -                                                |
| LOAD LOGICAL CHARACTER HIGH           | LLCH          | E3C2   | R <sub>1</sub> .24-31                                  | S20 [8 bits]             | _                                                |
| LOAD LOGICAL HALFWORD HIGH            | LLHH          | E3C6   | R <sub>1</sub> .16-31                                  | S20 [16 bits]            | -                                                |
| ROTATE THEN INSERT SELECTED BITS HIGH | RISBHG        | EC5D   | R <sub>1</sub> .I <sub>3</sub> -I <sub>4</sub>         | R <sub>2</sub> .0-63     | l <sub>3</sub> , l <sub>4</sub> , l <sub>5</sub> |
| ROTATE THEN INSERT SELECTED BITS LOW  | RISBLG        | EC51   | R <sub>1</sub> .32+I <sub>3:</sub> - 32+I <sub>4</sub> | R <sub>2</sub> .0-63     | l <sub>3</sub> , l <sub>4</sub> , l <sub>5</sub> |
| STORE CHARACTER HIGH                  | STCH          | E3C3   | R <sub>1</sub> .24-31                                  | S20 [8 bits]             | _                                                |
| STORE HALFWORD HIGH                   | <u>STHH</u>   | E3C7   | R <sub>1</sub> .16-31                                  | S20 [16 bits]            | _                                                |
| STORE HIGH                            | STFH          | E3CB   | R <sub>1</sub> .0-31                                   | S20 [32 bits]            | -                                                |
| SUBTRACT HIGH                         | <u>SHHHR</u>  | B9C9   | R <sub>1</sub> .0-31                                   | R <sub>2</sub> .0-31     | R <sub>3</sub> .0-31                             |
| SUBTRACT HIGH                         | SHHLR         | B9D9   | R <sub>1</sub> .0-31                                   | R <sub>2</sub> .0-31     | R <sub>3</sub> .32-63                            |
| SUBTRACT LOGICAL HIGH                 | <u>SLHHHR</u> | B9CB   | R <sub>1</sub> .0-31                                   | R <sub>2</sub> .0-31     | R <sub>3</sub> .0-31                             |
| SUBTRACT LOGICAL HIGH                 | SLHHLR        | B9DB   | R <sub>1</sub> .0-31                                   | R <sub>2</sub> .0-31     | R <sub>3</sub> .32-63                            |

This slide lists the remaining 18 instructions in the high-word facility, for a total of 30 instructions.



For ADD HIGH (AHHHR), the contents of the leftmost bits (0-31) of the general register designated by the  $R_3$  field of the instruction are added to the contents of the leftmost bits of the general register designated by the  $R_2$  field of the instruction. The results of the addition replace the leftmost bits of the general register remain unchanged.

The addition proceeds exactly as for ADD (AR), except that there are two source operands and a separate target operand – and, obviously, the result ends up in the left of the register.

The condition code is set as with any other signed addition operation.



ADD HIGH (AHHLR) should perhaps be called ADD HIGH AND LOW.

The contents of the rightmost bits (32-63) of the general register designated by the  $R_3$  field of the instruction are added to the contents of the leftmost bits (0-31) of the general register designated by the  $R_2$  field of the instruction. The results of the addition replace the leftmost bits of the general register designated by the  $R_1$  operand; bits 32-63 of the result register remain unchanged.

The condition code is set as with any other signed addition operation.



ADD IMMEDIATE HIGH (AIH) adds the contents of the 32-bit signed  $I_2$  field (bits 16-47 of the instruction) with the contents of the leftmost bits (0-31) of the general register designated by the  $R_1$  field of the instruction. The results of the addition replace the leftmost bits of the general register designated by the  $R_1$  operand; bits 32-63 of the result register remain unchanged.

Unlike ADD HIGH (AHHHR and AHHLR), the result replaces the leftmost bits of the first-operand register.

The condition code is set as with any other signed addition operation.



For ADD LOGICAL HIGH (ALHHHR), the contents of the leftmost bits (0-31) of the general register designated by the  $R_3$  field of the instruction are added to the contents of the leftmost bits of the general register designated by the  $R_2$  field of the instruction. The results of the addition replace the leftmost bits of the general register designated by the  $R_1$  field of the instruction; bits 32-63 of the result register remain unchanged.

The addition proceeds exactly as for ADD LOGICAL (ALR), except that there are two source operands and a separate target operand – and, obviously, the result ends up in the left of the register.

The condition code is set as with any other unsigned addition operation.



As with ADD HIGH (AHHLR), ADD LOGICAL HIGH (ALHHLR) should perhaps be called ADD LOGICAL HIGH AND LOW.

The contents of the rightmost bits (32-63) of the general register designated by the  $R_3$  field of the instruction are added to the contents of the leftmost bits (0-31) of the general register designated by the  $R_2$  field of the instruction. The results of the addition replace the leftmost bits of the general register designated by the  $R_1$  operand; bits 32-63 of the result register remain unchanged.

The condition code is set as with any other unsigned addition operation.



ADD LOGICAL WITH SIGNED IMMEDIATE HIGH (ALSIH) adds the contents of the 32-bit signed  $I_2$  field (bits 16-47 of the instruction) with the contents of the leftmost <u>unsigned</u> bits (0-31) of the general register designated by the  $R_1$  field of the instruction. The results of the addition replace the leftmost bits of the general register designated by the  $R_1$  operand; bits 32-63 of the result register remain unchanged.

As with ADD IMMEDIATE HIGH, the result replaces the leftmost bits of the first-operand register.

The condition code is set as with any other unsigned addition operation!! Although having the second operand be signed reduces the magnitude of the addend by a power of two, it also eliminates the need to define a separate SUBTRACT LOGICAL IMMEDIATE instruction. To subtract, one simply uses a negative second operand.



ADD LOGICAL WITH SIGNED IMMEDIATE HIGH (ALSIHN) is identical to ADD LOGICAL WITH SIGNED IMMEDIATE HIGH (ALSIH), except that the condition code remains unchanged.



BRANCH RELATIVE ON COUNT HIGH (BRCTH) is an analog to BRANCH RELATIVE AND COUNT (BRCT). BRCTH works identically to BRCT, except that the decremented value (that is, the counter) is in the leftmost bits of the general register designated by the R<sub>1</sub> field of the instruction.

The rightmost 32 bits (32-63) of the counting register and the condition code remain unchanged.



For COMPARE HIGH (CHHR), the contents of the leftmost bits (0-31) of the general register designated by the  $R_2$  field of the instruction are arithmetically compared with the contents of the leftmost bits of the general register designated by the  $R_1$  field of the instruction. The rightmost 32 bits of each register are ignored.



For COMPARE HIGH (CHLR), the contents of the rightmost bits (32-63) of the general register designated by the  $R_2$  field of the instruction are arithmetically compared with the contents of the leftmost bits (0-31) of the general register designated by the  $R_1$  field of the instruction. The rightmost 32 bits of general register  $R_1$  and the leftmost 32 bits of general register  $R_2$  are ignored.



COMPARE HIGH (CHF) is an analog to the COMPARE (C) instruction; the difference being that for CHF, the leftmost 32 bits of the register are compared.

The 32-bit second operand in storage is arithmetically compared with the contents of the leftmost bits of the general register designated by the  $R_1$  field of the instruction. The rightmost 32 bits of general register  $R_1$  are ignored.



COMPARE IMMEDIATE HIGH (CIH) is an analog to the COMPARE IMMEDIATE (CFI) instruction; the difference being that for CIH, the leftmost 32 bits of the register are compared. (CFI was introduced with the general-instruction extension facility in the System z10.)

The 32-bit second immediate field (bits 16-47) of the instruction is arithmetically compared with the contents of the leftmost bits of the general register designated by the  $R_1$  field of the instruction. The rightmost 32 bits of general register  $R_1$  are ignored.



COMPARE LOGICAL HIGH (CLHHR) is an analog to the COMPARE LOGICAL (CLR) instruction; the difference being that for CLHHR, the leftmost 32 bits of the register are compared.

The contents of the leftmost bits (0-31) of the general register designated by the  $R_2$  field of the instruction are logically compared with the contents of the leftmost bits of the general register designated by the  $R_1$  field of the instruction. The rightmost 32 bits of each register are ignored.



For COMPARE LOGICAL HIGH (CLHLR), the contents of the rightmost bits (32-63) of the general register designated by the  $R_2$  field of the instruction are logically compared with the contents of the leftmost bits (0-31) of the general register designated by the  $R_1$  field of the instruction. The rightmost 32 bits of general register  $R_1$  and the leftmost 32 bits of general register  $R_2$  are ignored.



COMPARE LOGICAL HIGH (CLHF) is an analog to the COMPARE LOGICAL (CL) instruction; the difference being that for CLHF, the leftmost 32 bits of the register are compared.

The 32-bit second operand in storage is logically compared with the contents of the leftmost bits of the general register designated by the  $R_1$  field of the instruction. The rightmost 32 bits of general register  $R_1$  are ignored.



COMPARE LOGICAL IMMEDIATE HIGH (CLIH) is an analog to the COMPARE LOGICAL IMMEDIATE (CLFI) instruction; the difference being that for CLIH, the leftmost 32 bits of the register are compared. (CLFI was introduced with the general-instructions extension facility in the System z10.)

The 32-bit second immediate field (bits 16-47) of the instruction is logically compared with the contents of the leftmost bits of the general register designated by the  $R_1$  field of the instruction. The rightmost 32 bits of general register  $R_1$  are ignored.



LOAD BYTE HIGH (LBH) is the analog to the LOAD BYTE (LB), except that the results are placed in the leftmost bits of the first-operand register. (LOAD BYTE (LB) was introduced with the long-displacement facility in the z990.)

The byte in storage designated by the second-operand location is sign extended on the left and the result is placed in bits 0-31 of the general register designated by the  $R_1$  field of the instruction.



LOAD HALFWORD HIGH (LHH) is the analog to the LOAD HALFWORD (LH), except that the results are placed in the leftmost bits of the first-operand register.

The two-byte field in storage designated by the second-operand location is sign extended on the left and the result is placed in bits 0-31 of the general register designated by the  $R_1$  field of the instruction.



LOAD HIGH (LFH) is the analog to the LOAD (L), except that the results are placed in the leftmost bits of the first-operand register.

The four-byte field in storage designated by the second-operand location is placed in bits 0-31 of the general register designated by the  $R_1$  field of the instruction.



LOAD LOCICAL CHARACTER HIGH (LLCH) is the analog to the LOAD LOGICAL CHARACTER (LLC), except that the results are placed in the leftmost bits of the first-operand register. (LOAD LOGICAL CHARACTER (LLC) was introduced with the extended-immediate facility in the z9-109.)

The byte in storage designated by the second-operand location is zero extended on the left and the result is placed in bits 0-31 of the general register designated by the  $R_1$  field of the instruction.



LOAD LOCICAL HALFWORD HIGH (LLHH) is the analog to the LOAD LOGICAL HALFWORD (LLH), except that the results are placed in the leftmost bits of the first-operand register. (LOAD LOGICAL HALFWORD (LLH) was introduced with the extended-immediate facility in the z9-109.)

The two bytes in storage designated by the second-operand location are zero extended on the left and the result is placed in bits 0-31 of the general register designated by the  $R_1$  field of the instruction.



ROTATE THEN INSERT SELECTED BITS HIGH (RISBHG) is the analog to ROTATE THEN INSERT SELECTED BITS (RISBG), except that the results of RISBHG are limited to the leftmost bits of general register  $R_1$ . Note ROTATE THEN INSERT SELECTED BITS (RISBG) was introduced with the general-instructions enhancement facility on the System z10.

All 64 bits of the second operand are rotated to the left by the number of bits specified in the fifth operand (note, if the fifth operand is coded as a negative value, the rotation appears to occur to the right).

The  $I_3$  and  $I_4$  fields of the instruction are used to specify a starting and ending bit position in the result register (that is, the general register designated by the  $R_1$  field of the instruction). The selected bits of the rotated second operand are inserted into the corresponding bits of the result register.

The remaining bits of the leftmost 32 bits of the result register are either left unchanged or set to zeros, depending on whether the zero-remaining-bits control (bit 0 of the  $I_3$  field of the instruction) is zero or one, respectively.

Unless the  $R_1$  and  $R_2$  fields designate the same register, the general register designated by the  $R_2$  field of the instruction remains unchanged. The rightmost 32 bits of the general register designated by the  $R_1$  field always remain unchanged.



ROTATE THEN INSERT SELECTED BITS LOW (RISBLG) is the analog to ROTATE THEN INSERT SELECTED BITS (RISBG), except that the results of RISBLG are limited to the rightmost bits of general register  $R_1$ . Note ROTATE THEN INSERT SELECTED BITS (RISBG) was introduced with the general-instructions enhancement facility on the System z10.

All 64 bits of the second operand are rotated to the left by the number of bits specified in the fifth operand (note, if the fifth operand is coded as a negative value, the rotation appears to occur to the right).

The  $I_3$  and  $I_4$  fields of the instruction are used to specify a starting and ending bit position in the rightmost 32 bits of the result register (that is, the general register designated by the  $R_1$  field of the instruction). Although the values of the  $I_3$  and  $I_4$  fields are each encoded in a range of 0-31, the effective bit positions in the 64-bit register are 32 bits higher. The selected (rightmost 32) bits of the rotated second operand are inserted into the corresponding (rightmost 32) bits of the result register.

The remaining bits of the rightmost 32 bits of the result register are either left unchanged or set to zeros, depending on whether the zero-remaining-bits control (bit 0 of the  $I_3$  field of the instruction) is zero or one, respectively.

Unless the  $R_1$  and  $R_2$  fields designate the same register, the general register designated by the  $R_2$  field of the instruction remains unchanged. The leftmost 32 bits of the general register designated by the  $R_1$  field always remain unchanged.



Note that for RISBHG, the  $I_3$  and  $I_4$  fields directly designate bits 0-31 of the result register. For RISBLG, a binary one is implicitly appended to the left of the values coded in the  $I_3$  and  $I_4$  fields, thus the effective bit positions in the result register are 32-63.

Unlike ROTATE THEN INSERT SELECTED BITS (RISBG), the ROTATE THEN INSERT SELECTED BITS HIGH / LOW instructions do not set the condition code. This allows the instructions to be used to implement pseudo-instructions (see the next slide).

| Instruction Name                   | Extended M | Inemonic                       | RISBHG / RISBLG Equi |                                          |
|------------------------------------|------------|--------------------------------|----------------------|------------------------------------------|
| LOAD (HIGH←HIGH)                   | LHHR       | R <sub>1</sub> ,R <sub>2</sub> | RISBHGZ              | R <sub>1</sub> ,R <sub>2</sub> ,0,31     |
| LOAD (HIGH←LOW)                    | LHLR       | R <sub>1</sub> ,R <sub>2</sub> | RISBHGZ              | R <sub>1</sub> ,R <sub>2</sub> ,0,31,32  |
| LOAD (LOW←HIGH)                    | LLHFR      | R <sub>1</sub> ,R <sub>2</sub> | RISBLGZ              | R <sub>1</sub> ,R <sub>2</sub> ,0,31,32  |
| LOAD LOGICAL HALFWORD (HIGH←HIGH)  | LLHHHR     | R <sub>1</sub> ,R <sub>2</sub> | RISBHGZ              | R <sub>1</sub> ,R <sub>2</sub> ,16,31    |
| LOAD LOGICAL HALFWORD (HIGH←LOW)   | LLHHLR     | R <sub>1</sub> ,R <sub>2</sub> | RISBHGZ              | R <sub>1</sub> ,R <sub>2</sub> ,16,31,32 |
| LOAD LOGICAL HALFWORD (LOW-HIGH)   | LLHLHR     | R <sub>1</sub> ,R <sub>2</sub> | RISBLGZ              | R <sub>1</sub> ,R <sub>2</sub> ,16,31,32 |
| LOAD LOGICAL CHARACTER (HIGH←HIGH) | LLCHHR     | R <sub>1</sub> ,R <sub>2</sub> | RISBHGZ              | R <sub>1</sub> ,R <sub>2</sub> ,24,31    |
| LOAD LOGICAL CHARACTER (HIGH←LOW)  | LLCHLR     | R <sub>1</sub> ,R <sub>2</sub> | RISBHGZ              | R <sub>1</sub> ,R <sub>2</sub> ,24,31,32 |
| LOAD LOGICAL CHARACTER (LOW←HIGH)  | LLCLHR     | R1,R2                          | RISBLGZ              | R <sub>1</sub> ,R <sub>2</sub> ,24,31,32 |

With ROTATE THEN INSERT SELECTED BITS HIGH and ROTATE THEN INSERT SELECTED BITS LOW, a large group of other pseudo-instructions can be implemented, as illustrated on this slide.

The High-Level Assembler provides extended mnemonics that implement these pseudo-instructions, even though they are actually implemented with RISBHG and RISBLG.

| Instruction Name         | Extende | ed Mnemonic                    | R*SBG Equivalent |                                         |
|--------------------------|---------|--------------------------------|------------------|-----------------------------------------|
| AND HIGH (HIGH←HIGH)     | NHHR    | R <sub>1</sub> ,R <sub>2</sub> | RNSBG            | R <sub>1</sub> ,R <sub>2</sub> ,0,31    |
| AND HIGH (HIGH←LOW)      | NHLR    | R <sub>1</sub> ,R <sub>2</sub> | RNSBG            | R <sub>1</sub> ,R <sub>2</sub> ,0,31,32 |
| AND HIGH (LOW←HIGH)      | NLHR    | R <sub>1</sub> ,R <sub>2</sub> | RNSBG            | R <sub>1</sub> ,R <sub>2</sub> ,32,63,3 |
| EXCLUSIVE OR (HIGH←HIGH) | XHHR    | R <sub>1</sub> ,R <sub>2</sub> | RXSBG            | R <sub>1</sub> ,R <sub>2</sub> ,0,31    |
| EXCLUSIVE OR (HIGH←LOW)  | XHLR    | R <sub>1</sub> ,R <sub>2</sub> | RXSBG            | R <sub>1</sub> ,R <sub>2</sub> ,0,31,32 |
| EXCLUSIVE OR (LOW←HIGH)  | XLHR    | R <sub>1</sub> ,R <sub>2</sub> | RXSBG            | R <sub>1</sub> ,R <sub>2</sub> ,32,63,3 |
| OR (HIGH←HIGH)           | OHHR    | R <sub>1</sub> ,R <sub>2</sub> | ROSBG            | R <sub>1</sub> ,R <sub>2</sub> ,0,31    |
| OR (HIGH←LOW)            | OHLR    | R <sub>1</sub> ,R <sub>2</sub> | ROSBG            | R <sub>1</sub> ,R <sub>2</sub> ,0,31,32 |
| OR (LOW←HIGH)            | OLHR    | R <sub>1</sub> ,R <sub>2</sub> | ROSBG            | R <sub>1</sub> ,R <sub>2</sub> ,32,63,3 |

The High-Level Assembler also provides pseudo-instructions to perform high-word logical operations by using the ROTATE THEN AND SELECTED BITS, ROTATE THEN OR SELECTED BITS, and ROTATE THEN EXCLUSIVE OR SELECTED BITS instructions (RNSBG, ROSBG, and RXSBG were introduced with the System z10).



STORE CHARACTER HIGH (STCH) is the analog to STORE CHARACTER (STC), except that the byte stored is in bits 24-31 of the general register designated by the R<sub>1</sub> field of the instruction.

Bits 24-31 of general register  $R_1$  are placed into the byte in storage designated by the second-operand location.



STORE HALFWORD HIGH (STHH) is the analog to STORE HALFWORD (STH), except that the two bytes stored are in bits 16-31 of the general register designated by the R<sub>1</sub> field of the instruction.

Bits 16-31 of general register  $R_1$  are placed into the two bytes in storage designated by the second-operand location.



STORE HIGH (STFH) is the analog to STORE (ST), except that the four bytes stored are in bits 0-31 of the general register designated by the  $R_1$  field of the instruction.

Bits 0-31 of general register  $R_1$  are placed into the four bytes in storage designated by the second-operand location.



For SUBTRACT HIGH (SHHHR), the contents of the leftmost bits (0-31) of the general register designated by the  $R_3$  field of the instruction are arithmetically subtracted from the contents of the leftmost bits of the general register designated by the  $R_2$  field of the instruction. The difference replaces the leftmost bits of the general register designated by the  $R_1$  field of the instruction; bits 32-63 of the result register remain unchanged.

The subtraction proceeds exactly as for SUBTRACT (SR), except that there are two source operands and a separate target operand – and, obviously, the result ends up in the leftmost 32 bits of the register.

The condition code is set as with any other signed subtraction operation.



SUBTRACT HIGH (SHHLR) should perhaps be called SUBTRACT LOW FROM HIGH.

The contents of the rightmost bits (32-63) of the general register designated by the  $R_3$  field of the instruction are arithmetically subtracted from the contents of the leftmost bits (0-31) of the general register designated by the  $R_2$  field of the instruction. The difference replaces the leftmost bits of the general register designated by the  $R_1$  operand; bits 32-63 of the result register remain unchanged.

The condition code is set as with any other signed addition operation.



For SUBTRACT LOGICAL HIGH (SLHHR), the contents of the leftmost bits (0-31) of the general register designated by the  $R_3$  field of the instruction are logically subtracted from the contents of the leftmost bits of the general register designated by the  $R_2$  field of the instruction. The difference replaces the leftmost bits of the general register designated by the  $R_1$  field of the instruction; bits 32-63 of the result register remain unchanged.

The subtraction proceeds exactly as for SUBTRACT LOGICAL (SLR), except that there are two source operands and a separate target operand – and, obviously, the result ends up in the leftmost 32 bits of the register.

The condition code is set as with any other unsigned addition operation.



As with SUBTRACT HIGH (SHHLR), SUBTRCT LOGICAL HIGH (SLHHLR) should perhaps be called SUBTRACT LOGICAL LOW FROM HIGH.

The contents of the rightmost bits (32-63) of the general register designated by the  $R_3$  field of the instruction are logically subtracted from the contents of the leftmost bits (0-31) of the general register designated by the  $R_2$  field of the instruction. The difference replaces the leftmost bits of the general register designated by the  $R_1$  operand; bits 32-63 of the result register remain unchanged.

The condition code is set as with any other unsigned addition operation.



The interlocked-access facility provides instructions that are designed to facilitate multiprogramming; most of the instructions access memory in a block-concurrent, interlocked-update fashion (more details on the next slides).

Also, when the interlocked-access facility is installed, the ADD IMMEDIATE (ASI and AGSI) and ADD LOGICAL WITH SIGNED IMMEDIATE (ALSI and ALGSI) perform their storage accesses using block-concurrent, interlocked update when the storage operand is aligned on an integral boundary. Thus, as observed by other CPUs and the channel subsystem, the fetch, addition, and store of the result appear to occur atomically ... there is no need for a COMPARE AND SWAP loop to perform these operations!

| .OAD AND ADD<br>.OAD AND ADD<br>.OAD AND ADD LOGICAL | LAA   |      |                       |               | 3 <sup>rd</sup> Operand                          |
|------------------------------------------------------|-------|------|-----------------------|---------------|--------------------------------------------------|
|                                                      |       | EBF8 | R <sub>1</sub> .32-63 | S12 [32 bits] | R <sub>3</sub> .32-63                            |
| OAD AND ADD LOGICAL                                  | LAAG  | EBE8 | R <sub>1</sub> .0-63  | S12 [64 bits] | R <sub>3</sub> .0-63                             |
|                                                      | LAAL  | EBFA | R <sub>1</sub> .32-63 | S12 [32 bits] | R <sub>3</sub> .32-63                            |
| OAD AND ADD LOGICAL                                  | LAALG | EBEA | R <sub>1</sub> .0-63  | S12 [64 bits] | R <sub>3</sub> .0-63                             |
| OAD AND AND                                          | LAN   | EBF4 | R <sub>1</sub> .32-63 | S12 [32 bits] | R <sub>3</sub> .32-63                            |
| OAD AND AND                                          | LANG  | EBE4 | R <sub>1</sub> .0-63  | S12 [64 bits] | R <sub>3</sub> .0-63                             |
| OAD AND EXCLUSIVE OR                                 | LAX   | EBF7 | R <sub>1</sub> .32-63 | S12 [32 bits] | R <sub>3</sub> .32-63                            |
| OAD AND EXCLUSIVE OR                                 | LAXG  | EBE7 | R₁.0-63               | S12 [64 bits] | R <sub>3</sub> .0-63                             |
| OAD AND OR                                           | LAO   | EBF6 | R <sub>1</sub> .32-63 | S12 [32 bits] | R <sub>3</sub> .32-63                            |
| OAD AND OR                                           | LAOG  | EBE6 | R <sub>1</sub> .0-63  | S12 [64 bits] | R <sub>3</sub> .0-63                             |
| OAD PAIR DISJOINT                                    | LPD   | C84  | S12 [32 bits]         | S12 [32 bits] | R <sub>3</sub> .32-63<br>R <sub>3</sub> +1.32-63 |
| OAD PAIR DISJOINT                                    | LPDG  | C85  | S12 [32 bits]         | S12 [32 bits] | R <sub>3</sub> .0-63<br>R <sub>3</sub> +1.0-63   |

- The interlocked-access facility comprises two types of arithmetic operations (signed addition and unsigned addition), and three types of logical operations (AND, OR and XOR). For each of these operations. For each of these five operations, the instruction performs the following:
- 1. The second-operand storage location is fetched.
- An operation is performed using the contents of the third-operand register and the storage location, with the result being placed into the storage location. The access of the storage location (beginning with the fetch in step 1, through the store in this step) is performed as a blockconcurrent, interlocked update (that is, it's atomic).
- 3. The original second-operand value (prior to any modification in step 2) is placed in the firstoperand register.
- The illustrative sequence of the operation shown on the following slides differs somewhat from that described here, however the result is the same.
- The facility also includes an operation to access two discrete storage locations, providing an indication as to whether any other CPU altered one of the locations during the fetch.

For each of these operations, there is a 32-bit and a 64-bit version of the instruction.



For LOAD AND ADD (LAA), the contents of bits 32-63 of the general register designated by the  $R_3$  field of the instruction are preserved in a temporary location in the CPU. Then the word in storage designated by the second-operand location is fetched into bits 32-63 of the general register designated by the  $R_1$  field of the instruction. Finally, the temporary 32-bit value is added to the contents of the word in storage, and the result replaces the word in storage. As observed by other CPUs and the channel subsystem, the fetching and storing of the word in storage appear to occur as a block-concurrent interlocked update.

Alternatively, the word in storage may be fetched into a temporary location, the addition of that word and general register  $R_3$  occurs, and then the temporary value place in general register  $R_1$ . Regardless of method, the fetching into a temporary location ensures that the result in general register  $R_1$  is the original contents of the storage location (prior to the addition).



For LOAD AND ADD (LAAG), the contents of bits 0-63 of the general register designated by the  $R_3$  field of the instruction are preserved in a temporary location in the CPU. Then the doubleword in storage designated by the second-operand location is fetched into bits 0-63 of the general register designated by the  $R_1$  field of the instruction. Finally, the temporary 64-bit value is added to the contents of the doubleword in storage, and the result replaces the doubleword in storage. As observed by other CPUs and the channel subsystem, the fetching and storing of the doubleword in storage appear to occur as a block-concurrent interlocked update.



The operation of LOAD AND ADD LOGICAL (LAAL) is identical to that of LOAD AND ADD (LAA), except for the setting of the condition code. LAAL sets the condition code consistent with other unsigned additions.



The operation of LOAD AND ADD LOGICAL (LAALG) is identical to that of LOAD AND ADD (LAAG), except for the setting of the condition code. LAALG sets the condition code consistent with other unsigned additions.



For LOAD AND AND (LAN), the contents of bits 32-63 of the general register designated by the  $R_3$  field of the instruction are preserved in a temporary location in the CPU. Then the word in storage designated by the second-operand location is fetched into bits 32-63 of the general register designated by the  $R_1$  field of the instruction. Finally, the temporary 32-bit value is logically ANDed with the contents of the word in storage, and the result replaces the word in storage. As observed by other CPUs and the channel subsystem, the fetching and storing of the word in storage appear to occur as a block-concurrent interlocked update.



For LOAD AND AND (LANG), the contents of bits 0-63 of the general register designated by the  $R_3$  field of the instruction are preserved in a temporary location in the CPU. Then the doubleword in storage designated by the second-operand location is fetched into bits 0-63 of the general register designated by the  $R_1$  field of the instruction. Finally, the temporary 64-bit value is logically ANDed with the contents of the doubleword in storage, and the result replaces the doubleword in storage. As observed by other CPUs and the channel subsystem, the fetching and storing of the doubleword in storage appear to occur as a block-concurrent interlocked update.



For LOAD AND EXCLUSIVE OR (LAX), the contents of bits 32-63 of the general register designated by the  $R_3$  field of the instruction are preserved in a temporary location in the CPU. Then the word in storage designated by the second-operand location is fetched into bits 32-63 of the general register designated by the  $R_1$  field of the instruction. Finally, the temporary 32-bit value is logically exclusive ORed with the contents of the word in storage, and the result replaces the word in storage. As observed by other CPUs and the channel subsystem, the fetching and storing of the word in storage appear to occur as a block-concurrent interlocked update.



For LOAD AND EXCLUSIVE OR (LAXG), the contents of bits 0-63 of the general register designated by the  $R_3$  field of the instruction are preserved in a temporary location in the CPU. Then the doubleword in storage designated by the second-operand location is fetched into bits 0-63 of the general register designated by the  $R_1$  field of the instruction. Finally, the temporary 64-bit value is logically exclusive ORed with the contents of the doubleword in storage, and the result replaces the doubleword in storage. As observed by other CPUs and the channel subsystem, the fetching and storing of the doubleword in storage appear to occur as a block-concurrent interlocked update.



For LOAD AND OR (LAO), the contents of bits 32-63 of the general register designated by the  $R_3$  field of the instruction are preserved in a temporary location in the CPU. Then the word in storage designated by the second-operand location is fetched into bits 32-63 of the general register designated by the  $R_1$  field of the instruction. Finally, the temporary 32-bit value is logically ORed with the contents of the word in storage, and the result replaces the word in storage. As observed by other CPUs and the channel subsystem, the fetching and storing of the word in storage appear to occur as a block-concurrent interlocked update.



For LOAD AND OR (LAOG), the contents of bits 0-63 of the general register designated by the  $R_3$  field of the instruction are preserved in a temporary location in the CPU. Then the doubleword in storage designated by the second-operand location is fetched into bits 0-63 of the general register designated by the  $R_1$  field of the instruction. Finally, the temporary 64-bit value is logically ORed with the contents of the doubleword in storage, and the result replaces the doubleword in storage. As observed by other CPUs and the channel subsystem, the fetching and storing of the doubleword in storage appear to occur as a block-concurrent interlocked update.



For LOAD PAIR DISJOINT (LPD), the first and second operands are two distinct words in storage. The first and second operands are fetched into bits 32-63 of the even-odd general register pair designated by the  $R_3$  field of the instruction; the first operand is fetched into the even-numbered register, and the second operand is fetched into the odd-numbered register.

The condition code is set based on whether the pair of words were fetched without alteration by other CPUs or the channel subsystem. CC0 means that neither word was altered during the fetching; CC3 means that one of the words was altered.



For LOAD PAIR DISJOINT (LPDG), the first and second operands are two distinct doublewords in storage. The first and second operands are fetched into bits 0-63 of the even-odd general register pair designated by the  $R_3$  field of the instruction; the first operand is fetched into the even-numbered register, and the second operand is fetched into the odd-numbered register.

The condition code is set based on whether the pair of doublewords were fetched without alteration by other CPUs or the channel subsystem. CC0 means that neither doubleword was altered during the fetching; CC3 means that one of the doublewords was altered.



The load-and-store-on-condition facility provides a means of executing a load or store, subject to the control of the condition code. Therefore, no branch instruction(s) are necessary to select the various code paths that effect the loading or storing. Consider the following code fragment that implements a min() function for four storage parameters:

|   | LG  | 15,PARM1 |
|---|-----|----------|
|   | CG  | 15,PARM2 |
|   | JNL | A        |
|   | LG  | 15,PARM2 |
| A | CG  | 15,PARM3 |
|   | JNL | В        |
|   | LG  | 15,PARM3 |
| В | CG  | 15,PARM4 |
|   | JNL | С        |
|   | LG  | 15,PARM4 |
| С |     |          |

With the load-and-store-on-condition facility, equivalent function can be realized without all the branching instructions, as follows

| LG   | 15,PARM1         |     |       |           |
|------|------------------|-----|-------|-----------|
| CG   | 15,PARM2         |     |       |           |
| LOCG | 15,PARM2,B'0100' | (or | LOCGL | 15,PARM2) |
| CG   | 15, PARM3        |     |       |           |
| LOCG | 15,PARM3,B'0100' | (or | LOCGL | 15,PARM3) |
| CG   | 15,PARM4         |     |       |           |
| LOCG | 15,PARM4,B'0100' | (or | LOCGL | 15,PARM4) |
|      |                  |     |       |           |

| Instruction                                                                          | Mnemonic              | OpCode        | 1 <sup>st</sup> Operand | 2 <sup>nd</sup> Operand | 3 <sup>rd</sup> Operand |
|--------------------------------------------------------------------------------------|-----------------------|---------------|-------------------------|-------------------------|-------------------------|
| LOAD ON CONDITION                                                                    | LOCR                  | B9F2          | R <sub>1</sub> .32-63   | R <sub>2</sub> .32-63   | Condition Mask          |
| LOAD ON CONDITION                                                                    | LOCGR                 | B9E2          | R <sub>1</sub> .0-63    | R <sub>2</sub> .0-63    | Condition Mask          |
| LOAD ON CONDITION                                                                    | LOC                   | EBF2          | R <sub>1</sub> .32-63   | S20 [32 bits]           | Condition Mask          |
| LOAD ON CONDITION                                                                    | LOCG                  | EBE2          | R <sub>1</sub> .0-63    | S20 [64 bits]           | Condition Mask          |
| STORE ON CONDITION                                                                   | STOC                  | EBF3          | R <sub>1</sub> .32-63   | S20 [32 bits]           | Condition Mask          |
| STORE ON CONDITION                                                                   | STOCG                 | EBE3          | R <sub>1</sub> .0-63    | S20 [64 bits]           | Condition Mask          |
| Explanation:<br>R <sub>n</sub> Register operand 'n'<br>S20 Storage operand designate | ed by base register v | vith 20-bit s | igned long dis          | placement               |                         |

For LOAD ON CONDITION, there are two forms of second operand: one source is a register and the other is a storage operand. For STORE ON CONDITION, the second operand is a storage operand. For each of these, there is an instruction that operates on 32-bit values and one that operates on 64-bit values.

As noted on the previous slide, the High-Level Assembler implements extended mnemonics for the load-and-store-on-condition facility. The extended mnemonic is formed by adding a suffix to one of the six basic mnemonics. When an extended mnemonic is coded, the conditional mask operand (the  $M_3$  field) is not coded.

The extended mnemonics represent the conditions that would be expected after a comparison operation: E, H, L, NE, NH, and NL. As the expected usage is following a compare instruction, HLASM does not provide extended mnemonics for other conditions (particularly CC3). However, the programmer can specify these conditions by using the  $M_3$  field.



This slide illustrates the operation of LOAD ON CONDITION (LOCR).

If the condition specified in the  $M_3$  field of the instruction (or specified by the extended mnemonic) is true, bits 32-63 of the general register specified by the  $R_2$  field of the instruction are copied into the corresponding bits of the general register specified by the  $R_1$  field; bits 0-31 of the register specified by the  $R_1$  field remain unchanged.

If the condition specified by the  $M_3$  field (or extended mnemonic) is not true, all bits in the general register specified by the  $R_1$  field remain unchanged.



This slide illustrates the operation of LOAD ON CONDITION (LOCGR).

If the condition specified in the  $M_3$  field of the instruction (or specified by the extended mnemonic) is true, bits 0-63 of the general register specified by the  $R_2$  field of the instruction are copied into the corresponding bits of the general register specified by the  $R_1$  field.

If the condition specified by the  $M_3$  field (or extended mnemonic) is not true, all bits in the general register specified by the  $R_1$  field remain unchanged.



This slide illustrates the operation of LOAD ON CONDITION (LOC).

If the condition specified in the  $M_3$  field of the instruction (or specified by the extended mnemonic) is true, the four bytes designated by the second-operand location are copied into bits 32-63 of the general register specified by the  $R_1$  field; bits 0-31 of the register remain unchanged.

If the condition specified by the  $M_3$  field (or extended mnemonic) is not true, all bits in the general register specified by the  $R_1$  field remain unchanged.



This slide illustrates the operation of LOAD ON CONDITION (LOCG).

If the condition specified in the  $M_3$  field of the instruction (or specified by the extended mnemonic) is true, the eight bytes designated by the second-operand location are copied into bits 0-63 of the general register specified by the  $R_1$  field.

If the condition specified by the  $\rm M_3$  field (or extended mnemonic) is not true, all bits in the general register specified by the  $\rm R_1$  field remain unchanged.



This slide illustrates the operation of STORE ON CONDITION (STOC).

If the condition specified in the  $M_3$  field of the instruction (or specified by the extended mnemonic) is true, bits 32-63 of the general register specified by the  $R_1$  field are stored at the four-byte second-operand location.

If the condition specified by the  $\rm M_3$  field (or extended mnemonic) is not true, no store operation occurs.



This slide illustrates the operation of STORE ON CONDITION (STOCG).

If the condition specified in the  $M_3$  field of the instruction (or specified by the extended mnemonic) is true, bits 0-63 of the general register specified by the  $R_1$  field are stored at the eight-byte second-operand location

If the condition specified by the  $\rm M_3$  field (or extended mnemonic) is not true, no store operation occurs.

|           |                                      |                                               | IBM             |
|-----------|--------------------------------------|-----------------------------------------------|-----------------|
|           | Distinct-Ope                         | rands Facility (1)                            |                 |
| Suite     | of instructions to provide nond      | lestructive analogs to existing destructive   |                 |
| instru    | uctions                              |                                               |                 |
| •         | Target register is separate from sou | Irce registers                                |                 |
| •         | Nondestructive instructions provide  | ed for:                                       |                 |
|           | ADD                                  | OR                                            |                 |
|           | ADD LOGICAL                          | SHIFT LEFT                                    |                 |
|           | ADD LOG. w/SIGN. IMMED.              | SHIFT RIGHT                                   |                 |
|           | AND                                  | SUBTRACT                                      |                 |
|           | EXCLUSIVE OR                         | SUBTRACT LOGICAL                              |                 |
| Inten     | ded to provide register-constrai     | nt relief for compilers                       |                 |
| Instal    | llation of the distinct-operands     | facility (& al.) indicated by facility bit 45 |                 |
|           |                                      |                                               |                 |
|           |                                      |                                               |                 |
|           |                                      |                                               |                 |
|           |                                      |                                               |                 |
|           |                                      |                                               |                 |
| SHARE 115 | C COLORD                             |                                               | 62 <u>Index</u> |

Beginning with the original System/360, the architecture has a long tradition of performing arithmetic or logical operations on two source operands, and then replacing one of the source operands with the result. This was completely understandable for RR-format instructions, where the instruction format only had room for two registers.

With the advent of newer instruction formats, there is sufficient space for separate source and target operand specifications. z/Architecture began exploiting this with the 64-bit shift operations, and the decimal-floating-point facility extended the practice by having the results of floating point computations placed in a register that can be distinct from the two source registers.

Having a separate destination operand register provides greater flexibility to compiler designers and assembler programmers. When a source operand needs to be preserved, extra instructions are not needed to perform a copying operation.

The distinct-operands facility introduces a series of arithmetic and logical instructions that have a result register that can be distinct from any of the source operands. For all of the instructions, the first (result) and third (source) operands are in a register; depending on the instruction, the second operand is a register, immediate field, or storage-type operand.

All of the distinct-operand-facility instructions have a suffix of "K" in the mnemonic.

| DD                               |         |      | 1 1                   | 2 <sup>nd</sup> Operand | 3 <sup>rd</sup> Operand |
|----------------------------------|---------|------|-----------------------|-------------------------|-------------------------|
|                                  | ARK     | B9F8 | R <sub>1</sub> .32-63 | R <sub>2</sub> .32-63   | R <sub>3</sub> .32-63   |
| DD                               | AGRK    | B9E8 | R <sub>1</sub> .0-63  | R <sub>2</sub> .0-63    | R <sub>3</sub> .0-63    |
| DD IMMEDIATE                     | AHIK    | ECD8 | R <sub>1</sub> .32-63 | l <sub>2</sub>          | R <sub>3</sub> .32-63   |
| DD IMMEDIATE                     | AGHIK   | ECD9 | R <sub>1</sub> .0-63  | I <sub>2</sub>          | R <sub>3</sub> .0-63    |
| DD LOGICAL                       | ALRK    | B9FA | R <sub>1</sub> .32-63 | R <sub>2</sub> .32-63   | R <sub>3</sub> .32-63   |
| DD LOGICAL                       | ALGRK   | B9EA | R <sub>1</sub> .0-63  | R <sub>2</sub> .0-63    | R <sub>3</sub> .0-63    |
| DD LOGICAL WITH SIGNED IMMEDIATE | ALHSIK  | ECDA | R <sub>1</sub> .32-63 | I <sub>2</sub>          | R <sub>3</sub> .32-63   |
| DD LOGICAL WITH SIGNED IMMEDIATE | ALGHSIK | ECDB | R <sub>1</sub> .0-63  | I <sub>2</sub>          | R <sub>3</sub> .0-63    |
| ND                               | NRK     | B9F4 | R <sub>1</sub> .32-63 | R <sub>2</sub> .32-63   | R <sub>3</sub> .32-63   |
| ND                               | NGRK    | B9E4 | R <sub>1</sub> .0-63  | R <sub>2</sub> .0-63    | R <sub>3</sub> .0-63    |

This slide introduces the various ADD and AND instructions in the distinct-operand facility.

For the ADD instructions, the second operand is either a register or immediate field. For the AND, OR, and XOR instructions, the second operand is always a register.

| nstruction                 | Mnemonic | OpCode | 1 <sup>st</sup> Operand | 2 <sup>nd</sup> Operand | 3 <sup>rd</sup> Operand |
|----------------------------|----------|--------|-------------------------|-------------------------|-------------------------|
| EXCLUSIVE OR               | XRK      | B9F7   | R <sub>1</sub> .32-63   | R <sub>2</sub> .32-63   | R <sub>3</sub> .32-63   |
| EXCLUSIVE OR               | XGRK     | B9E7   | R₁.0-63                 | R <sub>2</sub> .0-63    | R <sub>3</sub> .0-63    |
| DR                         | ORK      | B9F6   | R₁.32-63                | R <sub>2</sub> .32-63   | R <sub>3</sub> .32-63   |
| DR                         | OGRK     | B9E6   | R <sub>1</sub> .0-63    | R <sub>2</sub> .0-63    | R <sub>3</sub> .0-63    |
| SHIFT LEFT SINGLE          | SLAK     | EBDD   | R <sub>1</sub> .32-63   | S20                     | R <sub>3</sub> .32-63   |
| SHIFT LEFT SINGLE LOGICAL  | SLLK     | EBDF   | R <sub>1</sub> .32-63   | S20                     | R <sub>3</sub> .32-63   |
| SHIFT RIGHT SINGLE         | SRAK     | EBDC   | R <sub>1</sub> .32-63   | S20                     | R <sub>3</sub> .32-63   |
| SHIFT RIGHT SINGLE LOGICAL | SRLK     | EBDE   | R <sub>1</sub> .32-63   | S20                     | R <sub>3</sub> .32-63   |
| SUBTRACT                   | SRK      | B9F9   | R <sub>1</sub> .32-63   | R <sub>2</sub> .32-63   | R <sub>3</sub> .32-63   |
| SUBTRACT                   | SGRK     | B9E9   | R <sub>1</sub> .0-63    | R <sub>2</sub> .0-63    | R <sub>3</sub> .0-63    |
| SUBTRACT LOGICAL           | SLRK     | B9FB   | R <sub>1</sub> .32-63   | R <sub>2</sub> .32-63   | R <sub>3</sub> .32-63   |
| SUBTRACT LOGICAL           | SLGRK    | B9EB   | R <sub>1</sub> .0-63    | R <sub>2</sub> .0-63    | R <sub>3</sub> .0-63    |

This slide enumerates the remaining instructions in the distinct-operand facility.

For the SHIFT instructions, the second operand is not used to access storage; rather, the rightmost six bits of the second-operand address form the shift amount (just like any other shift operation).



For ADD (ARK), the second operand is added to the third operand, and the result is placed in the first operand. Each operand occupies the rightmost 32 bits (bits 32-63) of the general register designated by the corresponding R field of the instruction.

Unless the R<sub>1</sub> field designates the same register as the R<sub>2</sub> or R<sub>3</sub> field, the contents of the general registers designated by the R<sub>2</sub> and R<sub>3</sub> fields remain unchanged. The contents of bit positions 0-31 of the general register designated by the R<sub>1</sub> field always remains unchanged.



For ADD (AGRK), the second operand is added to the third operand, and the result is placed in the first operand. Each operand occupies all 64 bits of the general register designated by the corresponding R field of the instruction.

Unless the R<sub>1</sub> field designates the same register as the R<sub>2</sub> or R<sub>3</sub> field, the contents of the general registers designated by the R<sub>2</sub> and R<sub>3</sub> fields remain unchanged.



For ADD IMMEDIATE (AHIK), the 16-bit signed binary integer in the  $I_2$  field of the instruction is sign extended on the left to form a 32-bit signed value which is added to the third operand. The result of this addition is placed in the first operand. The first and third operands occupy the rightmost 32 bits (bits 32-63) of the general registers designated by the  $R_1$  and  $R_3$  fields of the instruction, respectively.

Unless the  $R_1$  field designates the same register as the  $R_3$  field, the contents of the general register designated by the  $R_3$  field remains unchanged. The contents of bit positions 0-31 of the general register designated by the  $R_1$  field always remains unchanged.



For ADD IMMEDIATE (AGHIK), the 16-bit signed binary integer in the  $I_2$  field of the instruction is sign extended on the left to form a 64-bit signed value which is added to the third operand. The result of this addition is placed in the first operand. The first and third operands occupy all 64 bits of the general registers designated by the  $R_1$  and  $R_3$  fields of the instruction, respectively.

Unless the  $R_1$  field designates the same register as the  $R_3$  field, the contents of the general register designated by the  $R_3$  field remains unchanged.



For ADD LOGICAL (ALRK), the second operand is added to the third operand, and the result is placed in the first operand. Each operand occupies the rightmost 32 bits (bits 32-63) of the general register designated by the corresponding R field of the instruction.

Unless the R<sub>1</sub> field designates the same register as the R<sub>2</sub> or R<sub>3</sub> field, the contents of the general registers designated by the R<sub>2</sub> and R<sub>3</sub> fields remain unchanged. The contents of bit positions 0-31 of the general register designated by the R<sub>1</sub> field always remains unchanged.



For ADD LOGICAL (ALGRK), the second operand is added to the third operand, and the result is placed in the first operand. Each operand occupies all 64 bits of the general register designated by the corresponding R field of the instruction.

Unless the  $R_1$  field designates the same register as the  $R_3$  field, the contents of the general register designated by the  $R_3$  field remains unchanged.



For ADD LOGICAL WITH SIGNED IMMEDIATE (ALHSIK), the 16-bit signed binary integer in the  $I_2$  field of the instruction is sign extended on the left to form a 32-bit signed value which is added to the third operand. The result of this addition is placed in the first operand. The first and third operands occupy the rightmost 32 bits (bits 32-63) of the general registers designated by the  $R_1$  and  $R_3$  fields of the instruction, respectively.

Unless the  $R_1$  field designates the same register as the  $R_3$  field, the contents of the general register designated by the  $R_3$  field remains unchanged. The contents of bit positions 0-31 of the general register designated by the  $R_1$  field always remains unchanged.



For ADD LOGICAL WITH SIGNED IMMEDIATE (ALGHSIK), the 16-bit signed binary integer in the  $I_2$  field of the instruction is sign extended on the left to form a 64-bit signed value which is added to the third operand. The result of this addition is placed in the first operand. The first and third operands occupy all 64 bits of the general registers designated by the  $R_1$  and  $R_3$  fields of the instruction, respectively.

Unless the  $R_1$  field designates the same register as the  $R_3$  field, the contents of the general register designated by the  $R_3$  field remains unchanged.



For AND (NRK), the second operand is logically ANDed with the third operand, and the result is placed in the first operand. Each operand occupies the rightmost 32 bits (bits 32-63) of the general register designated by the corresponding R field of the instruction.

Unless the  $R_1$  field designates the same register as the  $R_2$  or  $R_3$  field, the contents of the general registers designated by the  $R_2$  and  $R_3$  fields remain unchanged. The contents of bit positions 0-31 of the general register designated by the  $R_1$  field always remains unchanged.



For AND (NGRK), the second operand is logically ANDed with the third operand, and the result is placed in the first operand. Each operand occupies all 64 bits of the general register designated by the corresponding R field of the instruction.

Unless the R<sub>1</sub> field designates the same register as the R<sub>2</sub> or R<sub>3</sub> field, the contents of the general registers designated by the R<sub>2</sub> and R<sub>3</sub> fields remain unchanged.



For EXCLUSIVE OR (XRK), the second operand is logically exclusive-ORed with the third operand, and the result is placed in the first operand. Each operand occupies the rightmost 32 bits (bits 32-63) of the general register designated by the corresponding R field of the instruction.

Unless the R<sub>1</sub> field designates the same register as the R<sub>2</sub> or R<sub>3</sub> field, the contents of the general registers designated by the R<sub>2</sub> and R<sub>3</sub> fields remain unchanged. The contents of bit positions 0-31 of the general register designated by the R<sub>1</sub> field always remains unchanged.



For EXCLUSIVE OR (XGRK), the second operand is logically exclusive-ORed with the third operand, and the result is placed in the first operand. Each operand occupies all 64 bits of the general register designated by the corresponding R field of the instruction.

Unless the R<sub>1</sub> field designates the same register as the R<sub>2</sub> or R<sub>3</sub> field, the contents of the general registers designated by the R<sub>2</sub> and R<sub>3</sub> fields remain unchanged.



For OR (XRK), the second operand is logically ORed with the third operand, and the result is placed in the first operand. Each operand occupies the rightmost 32 bits (bits 32-63) of the general register designated by the corresponding R field of the instruction.

Unless the R<sub>1</sub> field designates the same register as the R<sub>2</sub> or R<sub>3</sub> field, the contents of the general registers designated by the R<sub>2</sub> and R<sub>3</sub> fields remain unchanged. The contents of bit positions 0-31 of the general register designated by the R<sub>1</sub> field always remains unchanged.



For OR (XGRK), the second operand is logically ORed with the third operand, and the result is placed in the first operand. Each operand occupies all 64 bits of the general register designated by the corresponding R field of the instruction.

Unless the R<sub>1</sub> field designates the same register as the R<sub>2</sub> or R<sub>3</sub> field, the contents of the general registers designated by the R<sub>2</sub> and R<sub>3</sub> fields remain unchanged.



For SHIFT LEFT SINGLE (SLAK), the 31-bit numeric part of the third operand is shifted left by the number of bits specified by the second-operand address, and the result is placed in the first operand. Zeros are supplied to the vacated bit positions on the right. The first and third operands are 32-bit signed binary integers in bits 32-63 of the respective registers, with the sign in bit position 32.

Unless the  $R_1$  field designates the same register as the  $R_3$  field, the contents of the general register designated by the  $R_3$  field remains unchanged. The contents of bit positions 0-31 of the general register designated by the  $R_1$  field always remains unchanged.

The second-operand address is not used to address data; rather, its rightmost six bits indicate the number of bit positions to be shifted. The remainder of the address is ignored.

The condition code is set based on whether the results are negative, zero, positive, or cause an overflow. An overflow occurs if one or more bits are shifted left out of bit position 33; if the fixed-point-overflow mask bit in the PSW is one, a fixed-point-overflow program interruption occurs.



For SHIFT LEFT SINGLE LOGICAL (SLLK), the third operand is shifted left by the number of bits specified by the second-operand address, and the result is placed in the first operand. Zeros are supplied to the vacated bit positions on the right. The first and third operands are 32-bit unsigned binary integers occupying the rightmost 32 bits (bits 32-63) of the general registers designated by the  $R_1$  and  $R_3$  fields of the instruction, respectively.

Unless the  $R_1$  field designates the same register as the  $R_3$  field, the contents of the general register designated by the  $R_3$  field remains unchanged. The contents of bit positions 0-31 of the general register designated by the  $R_1$  field always remains unchanged.

The second-operand address is not used to address data; rather, its rightmost six bits indicate the number of bit positions to be shifted. The remainder of the address is ignored.



For SHIFT RIGHT SINGLE (SRAK), the 31-bit integer portion of the third operand is shifted right by the number of bits specified by the second-operand address, and the result is placed in the first operand. The first and third operands are 32-bit signed binary integers in bits 32-63 of the respective registers, with the sign in bit position 32. The sign bit is supplied to the vacated bit positions on the left.

Unless the  $R_1$  field designates the same register as the  $R_3$  field, the contents of the general register designated by the  $R_3$  field remains unchanged. The contents of bit positions 0-31 of the general register designated by the  $R_1$  field always remains unchanged.

The second-operand address is not used to address data; rather, its rightmost six bits indicate the number of bit positions to be shifted. The remainder of the address is ignored.

The condition code is set based on whether the results are negative, zero, or positive.



For SHIFT RIGHT SINGLE LOGICAL (SRLK), the third operand is shifted right by the number of bits specified by the second-operand address, and the result is placed in the first operand. Zeros are supplied to the vacated bit positions on the left. The first and third operands are 32-bit unsigned binary integers occupying the rightmost 32 bits (bits 32-63) of the general registers designated by the  $R_1$  and  $R_3$  fields of the instruction, respectively.

Unless the  $R_1$  field designates the same register as the  $R_3$  field, the contents of the general register designated by the  $R_3$  field remains unchanged. The contents of bit positions 0-31 of the general register designated by the  $R_1$  field always remains unchanged.

The second-operand address is not used to address data; rather, its rightmost six bits indicate the number of bit positions to be shifted. The remainder of the address is ignored.



For SUBTRACT (SRK), the third operand is subtracted from the second operand, and the result is placed in the first operand. Each operand occupies the rightmost 32 bits (bits 32-63) of the general register designated by the corresponding R field of the instruction.

Unless the R<sub>1</sub> field designates the same register as the R<sub>2</sub> or R<sub>3</sub> fields, the contents of the general registers designated by the R<sub>2</sub> and R<sub>3</sub> fields remain unchanged. The contents of bit positions 0-31 of the general register designated by the R<sub>1</sub> field always remains unchanged.

The condition code is set as with all signed subtraction instructions.



For SUBTRACT (SGRK), the third operand is subtracted from the second operand, and the result is placed in the first operand. Each operand occupies all 64 bits of the general register designated by the corresponding R field of the instruction.

Unless the R<sub>1</sub> field designates the same register as the R<sub>2</sub> or R<sub>3</sub> fields, the contents of the general registers designated by the R<sub>2</sub> and R<sub>3</sub> fields remain unchanged.

The condition code is set as with all signed subtraction instructions.



For SUBTRACT LOGICAL (SLRK), the third operand is subtracted from the second operand, and the result is placed in the first operand. Each operand occupies the rightmost 32 bits (bits 32-63) of the general register designated by the corresponding R field of the instruction.

Unless the R<sub>1</sub> field designates the same register as the R<sub>2</sub> or R<sub>3</sub> field, the contents of the general registers designated by the R<sub>2</sub> and R<sub>3</sub> fields remain unchanged. The contents of bit positions 0-31 of the general register designated by the R<sub>1</sub> field always remains unchanged.

The condition code is set as with all unsigned subtraction instructions.



For SUBTRACT LOGICAL (SLGRK), the third operand is subtracted from the second operand, and the result is placed in the first operand. Each operand occupies all 64 bits of the general register designated by the corresponding R field of the instruction.

Unless the R<sub>1</sub> field designates the same register as the R<sub>2</sub> or R<sub>3</sub> fields, the contents of the general registers designated by the R<sub>2</sub> and R<sub>3</sub> fields remain unchanged.

The condition code is set as with all unsigned subtraction instructions.



The POPULATION COUNT instruction is useful for determining the number of one bits contained in each byte of a 64-bit register. For each byte in the register designated by the  $R_2$  field of the instruction, POPCNT places an 8-bit count of the number of one bits into the corresponding byte of the general register designated by the  $R_1$  field of the instruction.

POPCNT may be useful in applications that use bit maps to indicate the presence, validity, or availability of some group of resources. An example of such bit-map usage may be found in Appendix A of the *z*/Architecture Principles of Operation (SA22-7832) in the programming example for the FIND LEFTMOST ONE instruction.

POPCNT provides only an indication of one bits for each byte. If the application needs to know the number of one bits in larger units, it must perform its own post processing. The example shown illustrates a clever way of summing the eight bytes, however on some models, the MULTIPLY SINGLE instruction may be slower than a group of instructions, for example:

| POPCNT | 8,15   |
|--------|--------|
| AHHLR  | 8,8,8  |
| SLLG   | 9,8,16 |
| ALGR   | 8,9    |
| SLLG   | 9,8,8  |
| ALGR   | 8,9    |
| SRLG   | 8,8,56 |

This sequence of instructions can easily be adapted to produce a count of one bits per halfword or per word.



The floating-point extension facility provides enhancements to the binary-floating-point (BFP) and decimal-floating-point (DFP) facilities. BFP was added to the architecture late in the life of ESA/390 (circa 1998); DFP was added in the System z9-109 (circa 2005).

For BFP, a new rounding mode – round to prepare for shorter precision – is provided. The new rounding mode may be controlled by means of a new bit in the floating-point control register, or by means of the  $M_3$  field in alternate forms of the CONVERT FROM FIXED, CONVERT TO FIXED, LOAD FP INTEGER, and LOAD ROUNDED instructions.

For most computational DFP operations, a new quantum exception-exception condition exists whenever the delivered DFP result is inexact, or when the result is exact and finite but the delivered quantum differs from the preferred quantum. The quantum-exception condition also applies to the DIVIDE, LOAD FP INTEGER, QUANTIZE, and REROUND instructions, but for somewhat different causes. Whether or not the quantum-exception condition results in an interruption is controlled and indicated by a new mask and flag bit, respectively, in the floating-point control register.

For both BFP and DFP, a new  $M_4$  field has been added to certain alternate forms of instructions to control the IEEE inexact-exception condition.

Finally, both BFP and DFP have new instructions, CONVERT FROM LOGICAL and CONVERT TO LOGICAL, for converting between unsigned binary integers and the respective floating-point formats.

| truction                                                                                                                                                                               |                                               | -                               |                                      |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|---------------------------------|--------------------------------------|
|                                                                                                                                                                                        | Mnemonic                                      | Format                          | Opcode                               |
| NVERT FROM LOGICAL (Extended BFP - 32)                                                                                                                                                 | CXLFBR                                        | RRF                             | B392                                 |
| NVERT FROM LOGICAL (Long BFP 🗲 32)                                                                                                                                                     | CDLFBR                                        | RRF                             | B391                                 |
| NVERT FROM LOGICAL (Short BFP 🗲 32)                                                                                                                                                    | CELFBR                                        | RRF                             | B390                                 |
| NVERT FROM LOGICAL (Extended BFP 🗲 64)                                                                                                                                                 | CXLGBR                                        | RRF                             | B3A2                                 |
| NVERT FROM LOGICAL (Long BFP 🗲 64)                                                                                                                                                     | CDLGBR                                        | RRF                             | B3A1                                 |
| NVERT FROM LOGICAL (Short BFP 🗲 64)                                                                                                                                                    | CELGBR                                        | RRF                             | B3A0                                 |
| NVERT TO LOGICAL (32 	 Extended BFP)                                                                                                                                                   | CLFXBR                                        | RRF                             | B39E                                 |
| NVERT TO LOGICAL (32 🗲 Long BFP)                                                                                                                                                       | CLFDBR                                        | RRF                             | B39D                                 |
| NVERT TO LOGICAL (32 🗲 Short BFP)                                                                                                                                                      | CLFEBR                                        | RRF                             | B39C                                 |
| NVERT TO LOGICAL (64 	 Extended BFP)                                                                                                                                                   | CLGXBR                                        | RRF                             | B3AE                                 |
| NVERT TO LOGICAL (64 🗲 Long BFP)                                                                                                                                                       | CLGDBR                                        | RRF                             | B3AD                                 |
| NVERT TO LOGICAL (64 🗲 Short BFP)                                                                                                                                                      | CLGEBR                                        | RRF                             | B3AC                                 |
| BFP ROUNDING MODE                                                                                                                                                                      | SRNMB                                         | S                               | B2B8                                 |
| NVERT TO LOGICAL (32 ← Long BFP)<br>NVERT TO LOGICAL (32 ← Short BFP)<br>NVERT TO LOGICAL (64 ← Extended BFP)<br>NVERT TO LOGICAL (64 ← Long BFP)<br>NVERT TO LOGICAL (64 ← Short BFP) | CLFDBR<br>CLFBR<br>CLGXBR<br>CLGDBR<br>CLGEBR | RRF<br>RRF<br>RRF<br>RRF<br>RRF | B39D<br>B39C<br>B3AE<br>B3AD<br>B3AC |

This slide illustrates the new BFP instructions.

The majority of the instructions are various forms of the CONVERT FROM LOGICAL and CONVERT TO LOGICAL instructions. CONVERT FROM LOGICAL converts an unsigned binary integer in the second operand to a binary-floating-point value that is placed in the first operand. CONVERT TO LOGICAL rounds a binary-floating-point value in the second operand to an integer value and then converts it to fixed-point format in the first operand.

SET BFP ROUNDING MODE (SRNM) was the original instruction to set the 2-bit BFP rounding mode in the floating-point control register (FPCR). The new SRNMB instruction sets the 3-bit BFP rounding mode in the FPCR. SRNMB is a complete superset of the functionality of SRNM (SRNM is now deprecated.)

|                                        | xtension F |         |        |   |
|----------------------------------------|------------|---------|--------|---|
| Alternate Forms                        | of BFP Ins | structi | ons    |   |
| Instruction                            | Mnemonic   | Format  | Opcode |   |
| CONVERT FROM FIXED (Extended BFP - 32) | CXFBRA     | RRF     | B396   |   |
| CONVERT FROM FIXED (Long BFP + 32)     | CDFBRA     | RRF     | B395   |   |
| CONVERT FROM FIXED (Short BFP + 32)    | CEFBRA     | RRF     | B394   |   |
| CONVERT FROM FIXED (Extended BFP ← 64) | CXGBRA     | RRF     | B3A6   |   |
| CONVERT FROM FIXED (Long BFP + 64)     | CDGBRA     | RRF     | B3A5   |   |
| CONVERT FROM FIXED (Short BFP - 64)    | CEGBRA     | RRF     | B3A4   |   |
| CONVERT TO FIXED (32 	 Extended BFP)   | CFXBRA     | RRF     | B39A   |   |
| CONVERT TO FIXED (32 	 Long BFP)       | CFDBRA     | RRF     | B399   |   |
| CONVERT TO FIXED (32                   | CFEBRA     | RRF     | B398   |   |
| CONVERT TO FIXED (64 	 Extended BFP)   | CGXBRA     | RRF     | B3AA   |   |
| CONVERT TO FIXED (64 	 Long BFP)       | CGDBRA     | RRF     | B3A9   |   |
| CONVERT TO FIXED (64                   | CGEBRA     | RRF     | B3A8   |   |
| LOAD FP INTEGER (Extended BFP)         | FIXBRA     | RRF     | B347   | _ |
| LOAD FP INTEGER (Long BFP)             | FIDBRA     | RRF     | B35F   |   |
| LOAD FP INTEGER (Short BFP)            | FIEBRA     | RRF     | B357   |   |
| LOAD ROUNDED (Long BFP 		Extended)     | LDXBRA     | RRF     | B345   |   |
| LOAD ROUNDED (Short BFP 	 Extended)    | LEXBRA     | RRF     | B346   |   |
| LOAD ROUNDED (Short BFP - Long)        | LEDBRA     | RRF     | B344   |   |

This slide illustrates alternate forms of existing BFP instructions, as indicated by the "A" suffix on the mnemonic. The actual operation codes for these instructions are identical to those generated from mnemonics without the A, but the High-Level Assembler recognizes new operands when the "A" suffix is present.

For CONVERT FROM FIXED and LOAD ROUNDED, the alternate-mnemonic forms add both an  $M_3$  and  $M_4$  operand. The  $M_3$  operand provides a rounding control, and the  $M_4$  operand provides the IEEE-inexact-exception control. For CONVERT TO FIXED and LOAD FP INTEGER, a rounding control is already provided in the form of the  $M_3$  field, but the new  $M_4$  operand provides the IEEE-inexact-exception control. For each of these instructions, and for DIVIDE TO INTEGER, the new rounding method (round to prepare for shorter precision) may be specified.

| Instruction                              | Mnemonic | Format | Opcode |
|------------------------------------------|----------|--------|--------|
| CONVERT FROM FIXED (Extended DFP         |          |        |        |
|                                          | CXFTR    | RRF    | B959   |
| CONVERT FROM FIXED (Long DFP             | CDFTR    | RRF    | B951   |
| CONVERT FROM LOGICAL (Extended DFP ← 32) | CXLFTR   | RRF    | B95B   |
| CONVERT FROM LOGICAL (Long DFP - 32)     | CDLFTR   | RRF    | B953   |
| CONVERT FROM LOGICAL (Extended DFP - 64) | CXLGTR   | RRF    | B95A   |
| CONVERT FROM LOGICAL (Long DFP 🗲 64)     | CDLGTR   | RRF    | B952   |
| CONVERT TO FIXED (32 	 Extended DFP)     | CFXTR    | RRF    | B949   |
| CONVERT TO FIXED (32 🗲 Long DFP)         | CFDTR    | RRF    | B941   |
| CONVERT TO LOGICAL (32 ← Extended DFP)   | CLFXTR   | RRF    | B94B   |
| CONVERT TO LOGICAL (32 	 Long DFP)       | CLFDTR   | RRF    | B943   |
| CONVERT TO LOGICAL (64 ← Extended DFP)   | CLGXTR   | RRF    | B94A   |
| CONVERT TO LOGICAL (64 🗲 Long DFP)       | CLGDTR   | RRF    | B942   |

This slide illustrates the new DFP instructions.

As with BFP, the new DFP instructions are various forms of the CONVERT FROM LOGICAL and CONVERT TO LOGICAL instructions. CONVERT FROM LOGICAL converts an unsigned binary integer in the second operand to a decimal-floating-point value that is placed in the first operand. CONVERT TO LOGICAL rounds a decimal-floating-point value in the second operand to an integer value and then converts it to unsigned fixed-point format in the first operand.

| Floating-Point E                       |                   |                |            |  |
|----------------------------------------|-------------------|----------------|------------|--|
| Alternate Forms                        | <u>of DFP Ins</u> | <u>structi</u> | <u>ons</u> |  |
|                                        |                   |                |            |  |
|                                        |                   |                |            |  |
|                                        |                   |                |            |  |
| Instruction                            | Mnemonic          | Format         | Opcode     |  |
| ADD (Extended DFP)                     | AXTRA             | RRF            | B3DA       |  |
| ADD (Long DFP)                         | ADTRA             | RRF            | B3D2       |  |
| CONVERT FROM FIXED (Extended DFP - 64) | CXGTRA            | RRF            | B3F9       |  |
| CONVERT FROM FIXED (Long DFP - 64)     | CDGTRA            | RRF            | B3F1       |  |
| CONVERT TO FIXED (64 ← Extended DFP)   | CGXTRA            | RRF            | B3E9       |  |
| CONVERT TO FIXED (64 	 Long DFP)       | CGDTRA            | RRF            | B3E1       |  |
| DIVIDE (Extended DFP)                  | DXTRA             | RRF            | B3D9       |  |
| DIVIDE (Long DFP)                      | DDTRA             | RRF            | B3D1       |  |
| MULTIPLY (Extended DFP)                | MXTRA             | RRF            | B3D8       |  |
| MULTIPLY (Long DFP)                    | MDTRA             | RRF            | B3D0       |  |
| SUBTRACT (Extended DFP)                | SXTRA             | RRF            | B3DB       |  |
|                                        | SDTRA             | RRF            | B3D3       |  |

This slide illustrates alternate forms of existing DFP instructions, as indicated by the "A" suffix on the mnemonic. The actual operation codes for these instructions are identical to those generated from mnemonics without the A, but the High-Level Assembler recognizes new operands when the "A" suffix is present.

For the arithmetic operations, ADD, DIVIDE, MULTIPLY, and SUBTRACT, a new  $M_4$  operand is provided to control the rounding mode of the result.

For CONVERT FROM FIXED, a new  $M_3$  operand is provided to control the rounding mode of the result, and a new  $M_4$  operand provides the IEEE-inexact-exception control.

For CONVERT TO FIXED, a new M<sub>4</sub> operand provides the IEEE-inexact-exception control.

Also, for all DFP instructions for which a rounding mode exists in the base architecture (i.e., the M<sub>3</sub> field of CONVERT TO FIXED, LOAD FP INTEGER, LOAD ROUNDED, QUANTIZE, and REROUND), additional rounding methods are available.



The message-security assist was introduced in the System z10 at general-availability level 3 (November 2009). Although it is not new in the z196, we'll devote a few slides to it, as it hasn't been published before.

MSA-X3 provides a means to protect user cryptographic keys by encrypting them under machinegenerated wrapping keys. When this extension is installed, two wrapping keys are provided for each configuration: one for protecting user DEA keys and another for protecting user AES keys. The wrapping keys reside in the machine so that, with an appropriate setting of controls, no clear value of user cryptographic keys is observed anywhere in the system by any program.

The message-security-assist extension 3 may be available on models implementing the messagesecurity assist. The extension provides the following features:

• A 256-Bit AES Wrapping-Key Register: The register contents are used to protect user AES keys.

• A 256-Bit AES Wrapping-Key Verification-Pattern Register: The register contents are used to identify the version of the AES wrapping key.

• A 192-Bit DEA Wrapping-Key Register: The register contents are used to protect user DEA keys.

• A 192-Bit DEA Wrapping-Key Verification-Pattern Register: The register contents are used to identify the version of the DEA wrapping key.

A new section has been added to the back of the General Instructions chapter of the *z*/*Architecture Principles of Operation* describing the protection of cryptographic keys.



PERFORM CRYPTOGRAHPIC KEY MANAGEMENT OPERATION (PCKMO) is a control (privileged) instruction that provides a means of importing clear cryptographic keys.



New functions are also added to the existing CIPHER MESSAGE (KM), CIPHER MESSAGE WITH CHANING (KMC), and COMPUTE MESSAGE AUTHENTICATION CODE (KMAC) instructions that allow encryption to be performed using the <u>encrypted</u> keys.



The message-security-assist extension 4 (MSA-X4) is introduced with the IBM zEnterprise 196. It requires that the MSA-X3 facility also be installed.

MSA X4 provides support for cipher feedback (CFB) mode, output feedback (OFB) mode, and counter (CTR) mode of encryption and decryption. Additionally, primitive operations are provided to facility the support for the cipher-based message-authentication (CMAC) mode, the counter with cipher-block-chaining message-authentication code (CMM) mode, the Galois/counter mode, and the XTS mode.

|                                |                     | IEM      |  |  |
|--------------------------------|---------------------|----------|--|--|
| <b>MSA-X4 New Instructions</b> |                     |          |  |  |
| CIPHER MESSAGE WITH CF         | B (KMF)             |          |  |  |
| CIPHER MESSAGE WITH CO         | UNTER (KMCTR)       |          |  |  |
| CIPHER MESSAGE WITH OF         | В (КМО)             |          |  |  |
| ► Functions:                   |                     |          |  |  |
| – Query                        |                     |          |  |  |
| – DEA                          | – AES 128           |          |  |  |
| – TDEA                         | – AES 192           |          |  |  |
| – TDEA 192                     | – AES 256           |          |  |  |
| – Encrypted DEA                | – Encrypted AES 128 |          |  |  |
| – Encrypted TDEA               | – Encrypted AES 192 |          |  |  |
| – Encrypted TDEA 192           | – Encrypted AES 256 |          |  |  |
|                                |                     |          |  |  |
| SHARE 115                      |                     | 97 Index |  |  |

MSA-X4 introduces four new instructions, three of which are enumerated on this slide:

CIPHER MESSAGE WITH CFB (KMF) [cipher feedback mode]
CIPHER MESSAGE WITH COUNTER (KMCTR) [counter mode]

• CIPHER MESSAGE WITH OFB (KMO)

[output feedback mode]

Each of these instructions provides a common suite of functions listed. For each basic type of function, there is a corresponding encrypted-key version.



The fourth of the new MSA-X4 instructions describes the new PERFORM CRYPTOGRAPHIC COMPUTATION (PCC) instruction. This instruction provides the primitive operations to cipher-based-message-authentication-code mode and XTS mode.

|                                                | IBM             |
|------------------------------------------------|-----------------|
| MSA-X4 New Functions for Existing Instructions |                 |
| CIPHER MESSAGE (KM)                            |                 |
| ► XTS AES 128                                  |                 |
| ► XTS AES 256                                  |                 |
| Encrypted XTS AES 128                          |                 |
| Encrypted XTS AES 256                          |                 |
| COMPUTE INTERMEDIATE MESSAGE DIGEST (KIMD)     |                 |
| ► GHASH                                        |                 |
| COMPUTE MESSAGE AUTHENTICATION CODE (KMAC)     |                 |
| ► AES 128                                      |                 |
| ► AES 192                                      |                 |
| ► AES 256                                      |                 |
|                                                |                 |
| SHARE 115                                      | 99 <u>Index</u> |

MSA-X4 also adds new functions to existing message-security-assist instructions.

For CIPHER MESSAGE (KM), functions supporting the XTS and encrypted XTS modes are provided.

For COMPUTE INTERMEDIATE MESSAGE DIGEST (KIMD), a function is provided in support of the Galois/counter mode hashing.

For COMPUTE MESSAGE AUTHENTICATION CODE (KMAC), three new advanced-encryptionstandard (AES) functions are provided.



The enhancements described on this slide are changes to existing general instructions to provide improved performance or new function.

For as long as I can remember, the BRANCH ON CONDITION (BCR) instruction caused serialization and checkpoint synchronization to occur when the  $M_1$  and  $R_2$  fields of the instruction contain 1111 and 0000 binary, respectively. Without getting into tedious details of machine-check recovery, there may be situations where a programs wants to effect a serialization operation, but doesn't care about checkpoint synchronization. A new form of BCR will cause serialization only when the  $M_1$  and  $R_2$  fields of the instruction contain 1110 and 0000 binary, respectively.

MONITOR CALL provides a means by which a program can – with operating-system assistance – cause monitor-event program interruptions to occur during the execution of a program. The O/S can use these interruptions to count, measure, or otherwise observe the execution of the program. If the O/S does not enable the monitor class specified in the MC instruction (via control register 8), the instruction is effectively a no-op. This type of program measurement is expensive and tends to perturb the condition being measured. The enhanced-monitor facility provides a means by which MONITOR CALL can be used to effect the counting of events in a program – without a program interruption and (other than set-up of a counting array) without operating-system intervention.

COMPRESSION CALL is performed by a specialized component in the CPU that operates best when processing – and storing – data in larger chunks than just a byte. A new zero-padding control on the CMPSC instruction allows the instruction to operate in this more efficient manner when storing the last bytes of a result. The default zero-padding-control value of zero causes the CMPSC instruction to operate as originally defined to ensure complete compatibility with the original architecture, however we recommend that all users of CMPSC set the zero-padding control to one for potential improved performance.



The enhancements listed on this slide are all tweaks to control instructions.

As originally defined, INVALIDATE PAGE TABLE ENTRY (IPTE) sets the invalid bit to one in a PTE, and then signals all CPUs in the configuration to purge (at least) that entry from their translation-lookaside buffers (TLBs). Signaling and waiting for the acknowledgement of the TLB-purging was a time-consuming operation, especially if a large number of PTEs were being invalidated in bulk. The IPTE-range facility provides a new operand to the instruction that designates the number of PTEs to be invalidated. This allows the instruction to signal other CPUs to invalidate a block of contiguous PTEs, rather than once per PTE.

The SET STORAGE KEY (SSKE) instruction signal other CPUs of changes to the storage key to ensure that all CPUs observe a consistent key value. A signal to change the key may cause other CPUs to become quiesced to ensure that it is not accessing the storage in which the key is being changed. However, in certain situations the O/S may be able to ensure that other CPUs are not accessing the block (e.g., when a block is not mapped to a virtual address space). In such situations, performance may be improved by bypassing the quiesce operation. The nonquiescing-SSKE provides a new control to the SSKE instruction to cause the quiescing operation to be skipped. For compatibility purposes, the default (0) value of the control is to cause quiescing.

The reset-reference-bits-multiple facility provides a new control to the RESET REFERENCE BITS EXTENDED (RRBE) instruction, to allow it to reset the reference bits for multiple contiguous blocks of real storage with one execution of the instruction.



The old saw about effective presentations states, "tell them what you're going to say, say it, tell them what you've just said!" We're at the third point of that teaching, and as the past 99 slides illustrate, I've said a lot (or if you're reading these slides, you've read a lot).

The IBM zEnterprise introduces a wide variety of new CPU facilities – some of which are simply designed to provide new or extended functions – however most of these facilities are designed to provide improved performance.

There is the potential that in exploiting these new instructions, significant performance improvement may be realized. The high-word and distinct-operand facilities may provide register-constraint relief to certain applications. The interlocked-access and load-and-store-on-condition facilities may provide reduced instruction path length – the interlocked-access facility is particularly useful in MP applications.

The enhanced-floating-point facility provides additional function for floating-point applications, and the MSA-X3 and MSA-X4 facilities provide powerful operations for cryptographic and security applications.

In addition to improved performance and function, exploitation of these facilities may yield simpler code paths, thus making program execution faster and program debugging easier.



For those in the live audience, I will gladly entertain questions here.

For those who view this on the SHARE web site, your questions are also welcome. My email address is listed on the first slide.